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ABSTRACT 

Spatial  co-location  patterns  represent  the  subsets  of  events 
whose  instances  are  frequently  located  together  in  geographic 
space.  We  identified  the  computational  bottleneck  in  the 
execution  time  of  a  current  co-location  mining  algorithm. 
A  large  fraction  of  the  join-based  co-location  miner  algo¬ 
rithm  is  devoted  to  computing  joins  to  identify  instances  of 
candidate  co-location  patterns.  We  propose  a  novel  partial- 
join  approach  for  mining  co-location  patterns  efficiently.  It 
transactionizes  continuous  spatial  data  while  keeping  track 
of  the  spatial  information  not  modeled  by  transactions.  It 
uses  a  transaction-based  Apriori  algorithm  as  a  building 
block  and  adopts  the  instance  join  method  for  residual  in¬ 
stances  not  identified  in  transactions.  We  show  that  the 
algorithm  is  correct  and  complete  in  finding  all  co-location 
rules  which  have  prevalence  and  conditional  probability  above 
the  given  thresholds.  An  experimental  evaluation  using  syn¬ 
thetic  datasets  and  a  real  dataset  shows  that  our  algorithm 
is  computationally  more  efficient  than  the  join-based  algo¬ 
rithm. 

1.  INTRODUCTION 

A  co-location  represents  a  subset  of  spatial  boolean  events 
whose  instances  are  often  located  in  a  neighborhood.  Boolean 
spatial  events  describe  the  presence  or  absence  of  geographic 
object  types  at  different  locations  in  a  two  dimensional  or 
three  dimensional  metric  space,  e.g.,  surface  of  the  Earth. 
Examples  of  boolean  spatial  events  include  business  types, 
mobile  service  request,  disease,  crime,  climate,  plant  species, 
etc.  Spatial  co-location  patterns  may  yield  important  in¬ 
sights  for  many  applications.  For  example,  a  mobile  service 
provider  may  be  interested  in  service  patterns  frequently  re¬ 
quested  in  a  close  location,  e.g.,  ‘today  sales’  and  ‘nearby 
stores’.  The  frequent  neighboring  request  sets  may  be  used 
for  providing  attractive  location-sensitive  advertisements, 
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promotion,  etc.  Other  application  domains  for  co-locations 
are  Earth  science,  environmental  management,  government 
services,  public  health,  public  safety,  transportation,  tourism, 
etc. 

Co-location  rule  discovery  is  a  process  to  identify  co-location 
patterns  from  an  instance  dataset  of  spatial  boolean  events. 

It  is  not  trivial  to  adopt  association  rule  mining  algorithms  [1, 
8,  13,  18]  to  mine  co-location  patterns  since  instances  of  spa¬ 
tial  events  are  embedded  in  a  continuous  space  and  share  a 
variety  of  spatial  relationships.  Reusing  association  rule  al¬ 
gorithms  may  require  transactionizing  spatial  datasets,  which 
is  challenging  due  to  the  risk  of  transaction  boundaries  split¬ 
ting  co-location  pattern  instances  across  distinct  transac¬ 
tions.  Figure  1  (a)  shows  an  example  spatial  dataset  with 
three  spatial  events,  A,  B,  and  C.  Each  instance  is  repre¬ 
sented  by  its  event  type  and  unique  instance  id,  e.g.,  A.l. 
Solid  lines  show  neighbor  relationships  over  event  instances. 
For  example,  {A. 2,  B.4,  C.2}  and  {A. 3,  B.3,  C.l}  are  the 
instances  of  co-location  {A,  B,  C}  since  their  event  instances 
are  neighbors  of  each  other.  Figure  1  (b)  shows  the  problem 
of  explicit  transactionization.  Rectangular  grids  are  used  to 
produce  transactions  over  the  spatial  dataset.  As  can  be 
seen  by  the  solid  line  circle,  the  only  identified  instance  of 
co-location  {A,  B,  C}  is  {A. 2,  B.4,  C.2}.  The  instance  {A. 3, 
B.3,  C.l}  is  missed  due  to  the  split  caused  by  the  transaction 
boundaries. 

Related  Work:  In  previous  work  on  co-location  pattern 
discovery,  a  few  approaches  have  been  developed  to  iden¬ 
tify  instances  of  candidate  co-location  patterns.  One  ap¬ 
proach  [12]  groups  neighboring  instances  arbitrarily  with  a 
non-overlapping  instance  grouping  constraint.  This  disjoint 
grouping  method  may  yield  different  instance  sets  by  the 
order  of  grouping.  For  example,  Figure  1  (c)  illustrates  dif¬ 
ferent  instance  sets  of  co-location  {A,  B,  C}  by  the  order 
of  grouping  instances  of  size  2  co-location  {A,  B}.  If  an  in¬ 
stance  {A. 4,  B.3}  is  first  grouped,  the  instance  {A. 3,  B.3} 
is  not  identified  since  B.3  already  belongs  to  instance  {A. 4, 
B.3}  even  if  it  is  a  neighborhood  instance.  Consequently, 
the  instance  {A. 3,  B.3,  C.l}  of  co-location  {A,  B,  C}  is  also 
not  found. 

Another  approach  [15]  generates  instances  of  candidate 
co-locations  without  any  missing  by  using  an  instance  join 
method.  For  example,  in  Figure  1  (d),  the  instances  of  co- 
location  {A,  B}  and  the  instances  of  co-location  {A,  C}  are 
joined  and  their  neighbor  relations  are  checked  for  gener- 


{A,  B,  C}’s  instances={(A.2,  B.4,  C.2),(A.3,  B.3,  C.l)} 


{A,  B,  C}’s  instances={(A.2,  B.4,  C.2)} 

or  {(A.2,  B.4,  C.2) ,  (A.3,  B.3,  C.l)} 


A  B  AC  B  C 
A.l  B.l  —  A.l  C.l  B.3  C.l 

A.2  B.4  —  A.2  C.2  B.4  C.2 

A.3  B.3  —  A.3  C.l 

A.4  B.3  ^  A.3  C.3 


ABC 
A.2  B.4  C.2 
A.3  B.3  C.1 

{A,  B,C}’s  instances 
={(A.2,  B.4,  C.2),  (A.3,  B.3,  C.l)} 


(a)  An  example  dataset 


(b) 


(c) 


(d) 


Figure  1:  Examples  to  illustrate  different  approaches  to  discover  co-location  patterns  (b)  An  explicit  transac- 
tionization  of  a  spatial  dataset  can  split  instances  of  co-locations,  (c)  The  non-overlapping  grouping  method 
can  generate  sets  of  different  instances,  (d)  The  instance  join  method  generates  complete  instances  but 
computation  is  expensive. 


ating  instances  of  co-location  {A.  B,  C}.  {A.2,  B.4,  C.2} 
and  {A.3,  B.3,  C.l}  are  correctly  generated.  The  join-based 
algorithm  may  be  useful  in  analyzing  datasets  of  sparse  in¬ 
stances.  However,  scaling  the  algorithm  to  substantially 
large  dense  spatial  datasets  is  challenging  due  to  the  in¬ 
creasing  number  of  co-location  patterns  and  their  instances. 
Other  co-location  mining  work  [17]  presents  a  framework  for 
extended  spatial  objects,  e.g.,  polygons  and  line  strings.  It 
also  uses  an  instance  join  method  to  identify  nearby  spatial 
objects. 

This  paper  proposes  a  novel  approach  for  efficient  co- 
location  pattern  mining.  We  make  the  following  contribu¬ 
tions. 

Our  Contributions:  First,  we  identified  the  computa¬ 
tional  bottleneck  in  the  execution  time  of  the  join-based 
co-location  mining  algorithm  [15].  A  large  fraction  of  the  al¬ 
gorithm  is  devoted  to  computing  joins  to  identify  instances 
of  candidate  co-location  patterns.  Second,  we  propose  a 
novel  partial-join  approach  for  mining  co-location  patterns 
efficiently.  It  transactionizes  continuous  spatial  data  while 
keeping  track  of  the  spatial  information  not  modeled  by 
transactions.  This  approach  is  based  on  an  important  ob¬ 
servation  that  only  event  instances  having  at  least  one  cut 
neighbor  relation  are  related  to  co-location  instances  split 
over  transactions.  Third,  we  present  an  efficient  co-location 
mining  algorithm  to  concretize  the  partial-join  approach.  It 
uses  a  transaction-based  Apriori  algorithm  [1]  as  a  building 
block  and  adopts  the  instance  join  method  [15]  of  the  join- 
based  co-location  mining  algorithm  for  generating  residual 
co-location  instances  not  identified  by  transactions.  Fourth, 
we  prove  that  the  partial  join  algorithm  is  correct  and  com¬ 
plete  in  finding  all  co-location  rules  with  prevalence  and  con¬ 
ditional  probability  above  the  given  thresholds.  Fifth,  we 
provide  an  algebraic  cost  model  to  characterize  the  dom¬ 
inance  zone  of  the  performance  between  our  partial-join 
algorithm  and  the  join-based  algorithm.  Finally,  we  con¬ 
ducted  experiments  using  a  real  dataset  as  well  as  synthetic 
datasets.  The  experimental  evaluation  shows  that  our  algo¬ 
rithm  is  computationally  more  efficient  than  the  full  join- 


based  mining  algorithm. 

The  remainder  of  the  paper  is  organized  as  follows.  Sec¬ 
tion  2  presents  an  overview  of  basic  concepts  of  co-location 
pattern  mining.  In  Section  3,  we  present  the  partial  join  ap¬ 
proach  for  efficient  co-location  mining.  Section  4  describes 
the  partial  join  co-location  mining  algorithm.  The  proofs  of 
correctness  and  completeness  of  the  algorithm,  and  an  alge¬ 
braic  cost  model  are  given  in  Section  5.  Section  6  presents 
experimental  evaluations.  We  give  the  conclusion  and  dis¬ 
cuss  future  work  in  Section  7. 

2.  CO-LOCATION  PATTERN  MINING: 
BASIC  CONCEPTS 

This  section  describes  the  basic  concepts  for  mining  co- 
location  patterns. 

Given  a  set  of  boolean  spatial  events  E  =  {ej,  . ..  ,  et}, 
a  set  S  of  their  instances  {*i , . . .  ,  *„},  and  a  reflexive  and 
symmetric  neighbor  relation  R  over  S,  a  co- location  C  is  a 
subset  of  boolean  spatial  events,  i.e. ,  C  C  E  whose  instances 
I  C  S  form  a  clique  [3]  using  neighbor  relation  R.  For 
simplicity,  we  use  a  metric-based  neighbor  relation  R,  i.e., 
neighbor (ii,  i2)  between  event  instances  i\  and  i2  defined 
by  Euclidean_distance(ii,  i2)  <  a  user-specified  threshold  is 
used  as  a  neighbor  relation  R. 

A  co-location  rule  is  of  the  form:  Ci  — »  C2  ( p ,  cp),  where 
Ci  and  C 2  are  disjoint  co-locations,  p  is  a  value  representing 
the  prevalence  measure,  and  cp  is  the  conditional  probabil¬ 
ity. 

A  neighborhood  instance  I  of  a  co-location  C  is  a  row 
instance  (simply,  instance)  of  C  if  I  contains  instances  of  all 
events  in  C  and  no  proper  subset  of  I  does  so.  For  example, 
in  Figure  1  (d),  {A.l,  B.l}  is  a  row  instance  of  co-location 
{A,  B}.  {A.3,  C.l,  C.3}  is  a  neighborhood  in  Figure  1  (a) 
but  it  is  not  a  row  instance  of  co-location  {A,  C}  because 
its  subset  {A.3,  C.l}  contains  instances  of  all  events  in  {A, 
C}.  The  table  instance  of  a  co-location  C  is  the  collection 
of  all  row  instances  of  C.  For  example,  the  table  instance 
of  {B,  C}  in  Figure  1  (d)  has  two  row  instances,  {B.3,  C.l} 
and  {B.4,  C.2}. 


The  conditional  probability,  Pr(C-\  C'2),  of  a  co-location 
rule  Ci  — >  C2  is  the  probability  of  finding  an  instance  of  C2 
in  the  neighborhood  of  an  instance  of  Ci.  Formally,  it  is 


•  1  \ifQ~  (table  instance  of  C\\JC 2)\  , 

estimated  as  — 1 ,  . — t - ttt-i - ,  where  7r  is  a  pro- 

\taole  instance  of  Ci  |  '  ^ 

jection  operation  with  duplication  elimination. 

The  participation  index,  Pi(C )  is  used  as  a  co-location 
prevalence  measure.  The  participation  index  of  a  co-location 
C  =  {ei,...  ,e*,}  is  defined  as  min ei£c{Pr(C,ei)},  where 
Pr(C,ei)  is  the  participation  ratio  for  event  type  a 
in  a  co-location  C.  Pr(C,ei)  is  the  fraction  of  instances 
of  ei  which  participate  in  any  instance  of  co-location  C, 


\7re .  (table  instance  of  C)\  ,  .  .  .  .  , 

— .*  . ,  . — - - 7 — j — ,  where  n  is  a  projection  operation  with 

\table  instance  o  f  ei\  ’  xr  j  sr 

duplication  elimination.  For  example,  in  Figure  1  (a),  the 
total  number  of  instances  of  event  type  A  is  4  and  the  to¬ 
tal  number  of  instances  of  event  type  C  is  3.  From  Fig¬ 
ure  1  (d),  the  participation  index  of  co-location  c={A,  C}  is 
min{Pr( c,  A),  Pr(c,C)}  =  3/4  because  Pr( c,  A)  is  3/4  and 
Pr( c,C)  is  3/3.  A  high  participation  index  value  indicates 
that  the  spatial  events  in  a  co-location  pattern  likely  show 
up  together. 


Lemma  1.  The  participation  ratio  and  the  participation 
index  are  monotonically  non  increasing  with  the  size  of  the 
co-location  increasing. 

PROOF.  Please  refer  to  [15]  for  the  proof.  □ 


Lemma  1  ensures  that  the  participation  index  can  be  used 
to  effectively  prune  the  search  space  of  co-location  pattern 
mining. 


3.  PARTIAL  JOIN  APPROACH  FOR 
CO-LOCATION  PATTERN  MINING 

This  section  defines  our  partial  join  approach  for  efficient 
co-location  pattern  mining. 

3.1  Problem  Definition 

We  formalize  the  co-location  mining  problem  as  follows: 

Given: 

1)  A  set  of  k  spatial  event  types  E  =  {ei, . . .  ,  e*}  and  a  set 
of  their  instances  S  =  {ii,. . .  ,in},  each  i  £  S  is  a  vector  < 
instance  id,  spatial  event  type,  location  >,  where  location  £ 
a  spatial  framework 

2)  A  symmetric  and  reflexive  neighbor  relation  R  over  loca¬ 
tions 

3)  A  minimal  prevalence  threshold  (min  prev)  and  a  mini¬ 
mal  conditional  probability  threshold  ( min-cond-prob ) 

Find: 

Find  a  correct  and  complete  set  of  co-location  rules  with 
participation  index  >  minjprev  and  conditional  probability 
>  min-cond-prob. 

Objective: 

Minimize  computation  cost. 

Constraints: 

1)  R  is  a  distance  metric  based  neighbor  relation. 

2)  Ignore  edge  effects  in  R. 

3)  Correct  and  complete  in  finding  all  co-location  rules  sat¬ 
isfying  given  thresholds. 

4)  Spatial  dataset  is  a  point  dataset. 

3.2  Partial  Join  Approach 

The  basic  idea  of  the  partial  join  approach  is  to  reduce  the 
number  of  instance  joins  for  identifying  instances  of  candi¬ 


date  co-locations  by  transactionizing  a  spatial  dataset  un¬ 
der  a  neighbor  relationship  and  tracing  only  residual  neigh¬ 
borhood  instances  cut  apart  via  the  transactions.  The  key 
component  of  our  approach  is  how  we  identify  instances  of 
co-locations  split  across  explicit  transactions.  It  is  based  on 
an  observation  that  only  event  instances  having  at  least  one 
cut  neighbor  relationship  are  related  to  the  neighborhood 
instances  split  over  transactions.  To  formalize  this  idea,  we 
provide  a  set  of  definitions  of  key  terms  related  to  the  partial 
join  approach. 

Definition  1.  A  neighborhood  transaction  (simply, 
transaction)  is  a  set  of  instances  T  C  S  that  forms  a  clique 
using  a  neighbor  relation  R.  A  spatial  dataset  S  is  parti¬ 
tioned  to  a  set  of  disjoint  transactions  {Ti , . . .  ,  T„ }  where 
Ti  n  T,-  =  0,  i  ^  j  and  U(Tj, . . .  ,T„)  =  S. 

We  assume  a  spatial  dataset  S  can  be  partitioned  to  a 
set  of  distinct  transactions,  i.e.,  each  event  instance  i  £  S 
belongs  to  one  transaction.  For  example,  Figure  3  shows  a 
set  of  transactions  on  the  same  example  spatial  dataset  of 
Figure  1  (a).  The  dashed  circle  represents  a  neighborhood 
region  centered  at  an  arbitrary  location  on  a  spatial  frame¬ 
work.  The  instances  within  the  dashed  circle  are  neighbors 
of  each  other  and  thus  forms  a  transaction.  For  example,  B.2 
and  B.5  form  a  transaction.  A  spatial  dataset  can  be  differ¬ 
ently  transactionized  according  to  the  partitioning  method 
used.  Thus  the  transactions  generated  using  rectangular 
grids  in  Figure  1  (b)  are  a  little  different  from  the  trans¬ 
actions  illustrated  in  Figure  3.  For  example,  in  Figure  3, 
{A. 3,  C.l,  C.3}  forms  a  single  transaction.  By  contrast,  in 
Figure  1  (b),  it  is  divided  into  two  transactions,  {A. 3,  C.3) 
and  {C.l}.  We  will  examine  the  effect  of  different  transac- 
tionization  methods  in  future  work. 

Definition  2.  A  row  instance  I  of  a  co-location  C  is  an 
intraX  row  instance  (simply,  intraX  instance)  of  C  if  all 
instances  i  £  I  belong  to  a  common  transaction  T.  The 
intraX  table  instance  of  C  is  the  collection  of  all  intraX 
row  instances  of  C. 

For  example,  in  Figure  3,  {A. 3,  C.l}  is  an  intraX  instance 
of  co-location  {A,  C}  but  {A.l,  C.l}  is  not  since  its  event 
instances  A.l  and  C.l  are  members  of  different  transactions. 
The  intraX  table  instance  of  {A,  C}  consists  of  {A. 3,  C.l}, 
{A. 3,  C.3}  and  {A.2,  C.2}. 

Definition  3.  A  neighbor  relation  r  £  R  between  two 
event  instances,  i\,ii  £  S,i i  =£  ii  is  called  a  cut  neighbor 
relation  if  i i  and  12  are  neighbors  of  each  other  but  belong 
to  distinct  transactions. 

Figure  3  presents  cut  neighbor  relations  as  dotted  lines. 
{A.l,  C.l},  {A. 3,  B.3}  and  {B.3,  C.l}  has  cut  neighbor 
relations. 

Definition  4.  A  row  instance  I  of  a  co-location  C  is  an 
interX  row  instance  (simply,  interX  instance)  of  C  if  all 
instances  i  £  I  have  at  least  one  cut  neighbor  relation.  The 
interX  table  instance  of  C  is  the  collection  of  all  interX 
row  instances  of  C . 

For  example,  in  Figure  3,  {A. 3,  B.3}  is  an  interX  instance 
of  co-location  {A,  B}  because  A. 3  has  a  cut  neighbor  relation 
with  B.3  and  B.3  also  has  cut  neighbor  relations  with  A. 3 
and  with  C.l.  Note  {A. 3,  C.l}  is  an  interX  instance  as  well 


as  an  intraX  instance  of  {A,  C}.  InterX  table  instance  of 
{A,  C}  has  two  interX  instances  {A.l,  C.l}  and  {A. 3,  C.l}. 

Figure  2  illustrates  the  possible  instances  of  size  3  co- 
location  and  of  size  4  co-location  located  over  neighborhood 
transactions.  Black  dots  signify  event  instances,  circles  are 
transactions,  and  lines  show  neighbor  relations  between  two 
event  instances.  Especially,  dotted  lines  signify  cut  neighbor 
relations.  There  are  two  types  of  instances  of  co-locations. 
One  is  all  event  instances  of  a  co-location  instance  belong 
to  a  single  transaction.  The  other  is  the  event  instances 
are  distributed  across  two  or  more  transactions.  The  for¬ 
mer  is  the  case  of  an  intraX  instance  and  the  latter  is  an 
interX  instance.  We  can  notify  all  event  instances  of  in¬ 
terX  instances  are  related  to  at  least  one  cut  neighbor  rela- 
tion(dotted  lines). 


Figure  2:  The  cases  of  possible  instances  of  size  3 
and  of  size  4  co-locations  over  transactions 

Lemma  2.  For  a  co-location  C ,  the  table  instance  of  C 
is  the  union  of  intraX  table  instance  of  C  and  interX  table 
instance  of  C. 

PROOF.  The  table  instance  of  a  co-location  C  is  the  col¬ 
lection  of  all  (row)  instances  of  C.  First,  we  will  show  any 
instance,  I  =  {ii,...  ,in}  of  C  is  an  intraX  instance  of  C 
or  an  interX  instances  of  C.  Since  I  forms  a  clique  using  a 
neighbor  relation,  all  event  instances  of  I  can  be  included  in 
a  single  neighborhood  transaction  according  to  definition  1. 
I  becomes  an  intraX  instance.  By  contrast,  if  all  event  in¬ 
stances  of  I  are  not  in  a  single  transaction,  each  member 
should  have  at  least  one  cut  neighborhood  relation  with  the 
other  members  in  different  transactions  due  to  their  clique 
relation.  Thus,  I  becomes  an  interX  instance.  Second,  all 
instances  of  intraX  table  instance  and  interX  table  instance 
of  C  are  row  instances  whose  event  instances  form  a  clique 
according  to  definition  2  and  definition  4  .  □ 

4.  PARTIAL  JOIN  CO-LOCATION  MINING 
ALGORITHM 

This  section  describes  the  partial  join  co-location  min¬ 
ing  algorithm.  A  transaction-based  Apriori  algorithm  [1]  is 
used  as  a  building  block  to  identify  all  intraX  instances  of 
co-locations.  InterX  instances  are  generated  using  general¬ 
ized  apriorLgen  function  [15]  of  the  join-based  co-location 
mining  algorithm.  This  approach  is  expected  to  provide  a 
framework  for  efficient  co-location  mining  since  all  instances 
in  the  transaction  are  neighbors  of  each  other  and  no  spa¬ 
tial  operation  and  combinatorial  operation,  i.e.,  join,  is  re¬ 
quired  to  find  instances  of  candidate  co-locations  within  a 
transaction,  i.e.,  intraX  instances.  The  computation  cost  of 


Figure  3:  An  illustration  of  the  partial  join  co- 
location  mining  algorithm 


instance  join  operations  for  generating  only  interX  instances 
not  identified  in  the  transactions  is  relatively  cheaper  than 
one  for  finding  all  instances  of  co-locations.  The  partial-join 
mining  algorithm  for  co-location  patterns  is  described  as  fol¬ 
lows. 

Transactionization  of  a  spatial  dataset  :  Given  a  spa¬ 
tial  dataset  and  a  neighbor  relation,  the  spatial  dataset  is 
partitioned  for  generating  neighborhood  transactions.  There 
are  several  partitioning  methods  adopted  for  neighborhood 
transactions,  e.g.,  grids  [14],  maximal  cliques[3],  max-clique 
agglomerative  clustering  [20],  min  cut  partitioning  [6]  etc. 
The  ideal  case  is  a  method  to  generate  a  set  of  maximal 
cliques  with  minimizing  the  number  of  edges  cut  by  parti¬ 
tions.  In  the  case  of  a  simple  grid  partitioning,  rectangular 
grids  of  a  proximity  neighborhood  size  d  x  d,  where  d  is 
a  neighbor  distance  metric,  are  posed  on  a  spatial  frame¬ 
work,  and  event  instances  in  each  cell  are  gathered  for  a 
transaction.  Cut  neighbor  relations  can  be  detected  by  ex¬ 
amining  all  pairs  (11,12)  of  instances  in  neighboring  cells, 

i. e.,  (*1,12)  €  R  and  ii.trans.no  ^  i^-trans-no,  where  R 
is  a  neighbor  relation.  It  can  be  implemented  using  geo¬ 
metric  approaches,  e.g.,  plane  sweep  [2],  space  partition¬ 
ing  [9],  tree  matching  [10].  Size  2  interX  instances  are  gen¬ 
erated  from  all  pairs  (11,12)  of  instances  having  cut  neigh¬ 
bor  relations  in  each  transaction,  i.e.,  ii  €  B,  12  £  B  and 

ii. trans-no  =  i^-trans-no,  where  B  is  a  set  of  event  in¬ 
stances  having  cut  neighbor  relations  ,  as  well  as  cut  neigh¬ 
borhood  instances. 

Generation  of  candidate  co-locations  :  We  use  the 

apriori-gen  [1]  for  generating  candidate  co-location  sets. 
Size  k  +  1  candidate  co-locations  are  generated  from  size 
k  prevalent  co-locations.  The  anti-monotonic  property  of 
the  participation  index  makes  event  level  pruning  feasible. 

Scanning  transactions  and  gathering  intraX  instances 


Inputs 

E: a  set  of  boolean  spatial  event  types 
S:a  set  of  instances 

<event  type,  event  instance  id,  location> 

R:a  spatial  neighbor  relation 
minjprev.  prevalence  value  threshold 
miri-Cond-prob:  conditional  probability  threshold 
Output 

A  set  of  all  prevalent  co-location  rules  with 
participation  index  greater  than  miri-prev 
and  conditional  probability  greater  than 
min-cond-prob 
Variables 

k:  co-location  size 
T :  a  set  of  transactions 

Ck- a  set  of  size  k  candidate  co-locations 
Pk'. a  set  of  size  k  prevalent  co-locations 
Rk'.a.  set  of  size  k  co-location  rules 
IntraXk  '■  intraX  table  instances  of  Ck 
InterXk  :  interX  table  instances  of  Ck  ,  Pk 
Method 

1)  (T,  InterX?  )=transact ionize  (S ,  R)  ; 

2)  km  1;  Ci  =  E ;  Pi  =  E; 

3)  while  (not  empty  Pk )  do  { 

4)  Cfc-!i=gen_candidate_co-location(Pfc )  ; 

5)  for  all  transaction  t  €  T 

6)  /nfraXfe-|_i=gather_intraX_instances  (Cfc+i ,  t)  ; 

7)  if  k>  2 

8)  7nferXfe+i=gen_interX_intances  (Cfc+i ,  Inter Xk,  R)  ; 

9)  Pfc_|_i=select_pre  valent  _co-lo  cat  ion 

10)  (Ck+i,  IntraXk+i  {J  Inter Xk+i,min_prev) ; 

11)  Pfc+i=gen_co-location_rule  (Pk+i ,  min-cond-prob)  ; 

12)  k  =  k  +  1; 

13)  } 

14)  return  (J(P2,--  -  ,-Rfc+i); 

Algorithm  1:  Partial  join  co-location  algorithm 

:  In  each  iteration  step,  the  transactions  are  scanned  and 
the  intraX  instances  of  candidate  co-locations  are  enumer¬ 
ated.  This  step  is  similar  to  the  apriori  algorithm.  However, 
notice  that  the  transactions  of  a  spatial  event  dataset  differ 
from  the  transactions  of  a  market  basket  dataset.  The  tradi¬ 
tional  market  basket  data  transaction  has  only  boolean  item 
types,  i.e.,  an  item  is  present  in  a  transaction  or  not.  By 
contrast,  each  item  of  our  neighborhood  transaction  consists 
of  an  event  type  and  its  instance  id  as  described  in  Figure  3. 
One  event  type  can  have  several  instances  in  a  transaction. 
To  reuse  an  efficient  trie  data  structure  [4,  7]  in  determining 
instances  of  candidate  co-locations  in  a  transaction,  we  con¬ 
vert  several  items  of  same  event  type  with  different  instance 
ids  to  one  event  type  item  having  a  bitmap  structure  [5]  in 
which  corresponding  instance  id  bits  are  set.  The  converted 
transactions  are  searched  for  gathering  intraX  instances  of 
co-locations.  Figure  3  shows  a  conceptual  set  of  intraX  table 
instances.  Actually,  all  instances  are  enumulated  in  the  trie 
structure  of  itemsets  using  bitmaps. 

Generation  of  interX  table  instances  :  The  interX  table 
instance  of  Ck+ i,  k  >  2  are  generated  from  interX  table  in¬ 
stance  of  Ck  using  the  generalized  apriori.gen  function  [15]. 
The  SQL-like  syntax  is  described  below. 


forall  co-location  Ck+ l  £  Ck+i 

insert  into  Ck+i . interX_table_instance 
select  p. instance],  p.instance2,  ...,  p.instancefc 
,  g.instancefc 

from  Ck  ■  interX_table_instancei  p 
,  Ck  ■  interX_table_instance2  q 
where  (p. instance] ,  ...,  p.  instance^-]  ) 

=  (q.  instance]  ,  ...,  q.  instance*,-] ) 
and  (p.  instance^  ,  g.instancefc)  £  R; 

end ; 

In  Figure  3,  an  interX  table  instance  of  {A,  B}  having 
{A. 3,  B.3}  and  an  interX  table  instance  of  {A,  C}  having 
{A.l,  C.l}  and  {A. 3,  C.l}  are  joined  to  produce  interX  ta¬ 
ble  instance  of  {A,  B,  C}. 

Selection  of  Prevalent  Co-locations:  The  participation 
index  of  co-location  Ck+i  is  calculated  from  the  union  of  in¬ 
traX  table  instance(Cfe+i)  and  interX  table  instance(Cfe+i). 
Candidate  co-locations  are  pruned  using  a  given  prevalence 
threshold,  minjprev.  In  Figure  3,  co-location  {B,  C}  has 
two  instances,  i.e.,  one  is  an  intraX  instance,  {B.4,  C.2}  and 
the  other  is  an  interX  instance  {B.3,  C.l}.  The  participa¬ 
tion  index  of  co-location  {B,  C}  is  min{ 2/5,  2/3}  =  2/5. 
If  min-prev  is  given  as  1/2,  the  candidate  co-location  {B, 
C}  is  pruned  because  its  prevalence  measure  is  less  than  1  /2. 

Generation  of  Co-location  Rules:  This  step  generates 
all  co-location  rules  with  high  conditional  probability  above 
a  given  min-cond-prob. 

5.  ANALYSIS  OF  THE  PARTIAL  JOIN 
CO-LOCATION  MINING  ALGORITHM 

In  this  section,  we  analyze  the  partial  join  co-location  min¬ 
ing  algorithm  for  completeness,  correctness  and  computa¬ 
tional  complexity.  Completeness  implies  that  no  co-location 
rule  satisfying  given  prevalence  and  conditional  probability 
thresholds  is  missed.  Correctness  means  that  the  participa¬ 
tion  index  values  and  conditional  probability  of  generated 
co-location  rules  meet  the  user  specified  threshold. 

5.1  Completeness  and  Correctness 

Lemma  3.  The  partial  join  co-location  mining  algorithm 
is  correct. 

PROOF.  The  partial  join  co-location  mining  algorithm  is 
correct  if  co-location  patterns  produced  by  algorithm  1  meets 
the  thresholds  of  prevalence  value  and  conditional  probabil¬ 
ity.  First,  we  will  show  that  intraX  instances  and  interX 
instances  are  correct  in  the  neighbor  relation.  Step  1  in  al¬ 
gorithm  1  generates  neighborhood  transactions  according  to 
definition  1.  Thus  the  intraX  instances  gathered  in  step  6 
are  correct  in  the  neighbor  relation.  The  interX  instances 
generated  in  step  8  are  proved  by  the  correctness  of  gener¬ 
alized  apriori-gen  algorithm  [15].  That  is,  all  instances  of  a 
generated  interX  instance  are  neighbor  of  each  other.  Sec¬ 
ond,  step  9  ensures  that  only  prevalent  co-location  sets  are 
selected.  Thus  step  11  returns  co-location  rules  above  given 
thresholds  correctly.  □ 

Lemma  4.  The  partial  join  co-location  mining  algorithm 
is  complete. 


PROOF.  We  prove  if  a  co-location  is  prevalent,  it  is  found 
by  algorithm  1.  First,  the  monotonicity  of  the  participation 
index  in  lemma  1  proves  the  completeness  of  the  event  level 
pruning  of  candidate  co-locations  using  apriori_gen  in  step 
4.  Second,  we  will  show  that  the  intraX  table  instances  and 
the  interX  table  instances  generated  from  algorithm  1  are 
complete,  which  will  imply  that  all  instances  of  co-locations 
are  complete  according  to  lemma  2.  All  intraX  table  in¬ 
stances  are  completely  found  by  the  apriori  algorithm  in 
step  6.  Size  2  interX  table  instances  generated  from  step 
1  are  a  superset  of  all  neighboring  instances  necessary  to 
generate  size  k  +  1,  k  >  2  interX  instances.  In  step  8,  the 
completeness  of  the  instance  join  method  to  generate  interX 
instances  is  the  same  as  that  of  generalized  apriori-gen  [15]. 

In  step  1 1 ,  enumeration  of  the  subsets  of  each  of  the  preva¬ 
lence  co-locations  ensures  that  no  spatial  co-location  rules 
satisfying  given  prevalence  and  conditional  probabilities  are 
missed.  □ 

5.2  Computational  Complexity  Analysis 

This  section  compares  the  computational  cost  of  the  join- 
based  co-location  mining  algorithm  and  the  partial  join  al¬ 
gorithm.  Let  Tjb(k  + 1)  and  Tpj(k  + 1)  represent  the  costs  of 
iteration  k  of  the  join-based  algorithm  and  the  partial  join 
algorithm  respectively. 

Tjb(k  +  1)  =  Tgen_candi{Pk) 

T  Tgen_inst  (table jinsts  of  Pk)  T  T prune  (Ck+ 1) 

~  TgenAnst(table-insts  of  Pk) 

Tpj(/C“t“l)  —  Tgen_candi  (Pk) +Tgath_intraX _inst(tr ansactions) 

Tgen-interX -inst  (inter  Ji- -table~insts  O  f  Pk)  ~\~Tprune(Ck-\-l) 

Tgen_interX -inst  (inter IX.  -tablejinsts  O f  Pk) 

In  the  above  equations,  Tgen _candi(Pk)  represents  the  cost 
of  generating  size  k  +  1  candidate  co-location  with  the  preva¬ 
lent  size  k  co-locations.  TgenAnst(table-insts  of  Pk)  repre¬ 
sents  the  cost  of  generating  table  instances  of  size  k  +  1 
candidate  co-locations  with  size  k  table  instances. 
TgathAntraXAnst  (transactions)  is  the  cost  of  scanning  trans¬ 
actions  and  gathering  the  instances  of  the  size  k  +  1  candi¬ 
date  co-locations.  Tge„_interXAnst(interXjtableJnstofPk) 
is  the  cost  of  generating  interX  table  instances  of  the  size 
k  +  1  candidate  co-locations  with  size  k  interX  table  in¬ 
stances.  TPrune(Ck+ 1)  represents  the  cost  for  pruning  non 
prevalent  size  k  +  1  co-locations. 

The  bulk  of  time  is  consumed  in  generating  instances. 

We  assume  that  the  cost  of  gathering  intraX  instances  from 
transactions  is  relatively  cheaper  than  instance  join  cost, 
and  that  the  other  factors,  Tgen_cand,(Pk)  and  Tprune(Ck+ i) 
are  illegible.  Thus  the  computational  ratio  of  the  partial  join 
algorithm  over  the  join-based  algorithm  can  be  simplified  as 

TPj(k  + 1)  ^  Tgen_interX_inst(interXJable-instsofPk) 

Tjb(k  +  1)  Tgen_inst  (table  Ansts  of  Pk) 

The  computational  ratio  is  affected  by  the  size  of  interX 
table  instances  and  the  size  of  table  instances  of  co-location 
Pk-  The  dominance  factors  affecting  the  number  of  interX 
instances  and  the  number  of  total  instances  can  be  the  num¬ 
ber  of  cut  neighbor  relations  and  the  data  density  of  the 
neighborhood  area.  When  the  number  of  cut  neighbor  rela¬ 
tions  is  fixed  and  the  data  density  in  a  neighborhood  area 


grows,  the  size  of  table  instances  increases  rapidly  and  the 
cost  to  generate  the  table  instances  is  much  greater  than  the 
cost  to  generate  interX  table  instances.  By  contrast,  as  the 
number  of  cut  neighbor  relations  increases,  the  size  of  interX 
table  instances  increases.  Thus  the  average  cost  to  gener¬ 
ate  interX  table  instances  grows.  When  all  instances  have 
cut  neighbor  relations,  they  are  involved  in  interX  table  in¬ 
stances  thus  the  cost  to  generate  the  interX  table  instances 
is  similar  to  the  cost  to  generate  table  instances  in  the  join- 
based  algorithm.  In  our  experiments,  as  described  in  the 
next  section,  we  use  the  data  density  in  neighborhood  area 
and  the  ratio  of  cut  neighbor  relations  as  key  parameters  to 
evaluate  the  algorithms.  We  can  expect  that  the  partial  join 
approach  is  likely  more  efficient  than  the  join-based  method 
when  the  locations  of  spatial  events  are  clustered  in  neigh¬ 
borhood  areas  and  the  number  of  cut  neighbor  relations  is 
smaller. 


6.  EXPERIMENTAL  EVALUATION 


Figure  4:  Experimental  Design 


We  evaluated  the  performance  of  the  partial  join  algo¬ 
rithm  with  the  join-based  approach  using  synthetic  and  real 
datasets.  In  Subsection  6.1,  we  describe  an  overall  exper¬ 
imental  design  and  a  synthetic  data  generator.  In  Subsec¬ 
tion  6.2,  we  evaluate  the  computational  efficiency  gained 
from  our  partial  join  co-location  algorithm  with  synthetic 
datasets  by  studying  the  parameters  that  affect  performance. 
Subsection  6.3  compares  the  performance  of  the  algorithms 
using  a  real  dataset. 

6.1  Experiment  Design 

Figure  4  shows  an  overall  experiment  layout.  Synthetic 
datasets  were  generated  using  a  methodology  similar  to  the 
methodology  used  to  evaluate  the  join-based  algorithm  [15]. 
We  added  some  parameters  and  procedures  in  it  to  generate 
transactionized  instances  and  cut  neighbor  relations.  The 
synthetic  data  generator  allows  better  controls  in  studying 
the  effects  of  interesting  parameters.  First  we  describe  the 
layout  of  an  overall  spatial  framework.  For  simple  trans- 
actionization  of  a  spatial  dataset,  we  posed  grids  of  neigh¬ 
borhood  size  d  x  d  on  a  rectangle  spatial  framework  of  size 
D\  x  D2-  Each  grid  cell  is  implicitly  divided  into  two  parts,  a 
core  area  and  an  overlapping  area.  The  core  area  is  an  area 
in  which  event  instances  have  neighbor  relationships  with 
only  instances  in  its  grid  cell.  By  contrasts,  instances  in 
the  overlapping  area  are  also  under  neighbor  relations  with 
instances  in  its  neighboring  cells.  This  area  was  used  for 
generating  cut  neighbor  relations. 


The  synthetic  spatial  datasets  were  generated  as  follows. 
Given  a  number  of  base  co-location  patterns,  NcoJoc,  the 
size  of  each  co-location  m  was  picked  from  a  Poisson  distri¬ 
bution  with  mean  Ai.  We  assigned  randomly  chosen  sets  of 
event  types  to  the  co-location  patterns.  The  number  of  base 
instances  of  each  co-location  n 2  was  chosen  from  another 
Poisson  distribution  with  mean  A2.  Our  data  generator  is 
also  controlled  by  two  other  parameters,  cut  instance  ratio 
a  and  spatial  framework  size  /3.  The  cut  instance  ratio  was 
used  for  controlling  the  number  of  cut  neighbor  relations  in 
the  experiment.  (1  —  a)  *ri2  instances  were  generated  in  the 
core  area  of  a  randomly  chosen  cell,  a  *  instances  were 
generated  over  its  overlapping  area  and  the  overlapping  ar¬ 
eas  of  its  neighboring  cells.  For  simply  controlling  the  data 
density  value  under  datasets  of  the  same  size,  we  changed 
the  size  of  the  overall  spatial  framework.  To  increase  the 
density  value,  we  used  a  smaller  spatial  framework  but  the 
same  neighborhood  size  d  x  d. 

The  partial  join  co-location  algorithm  and  the  join-based 
co-location  algorithm  were  executed  using  generated  spatial 
datasets  and  a  real  set  of  climate  data  from  NASA.  The  per¬ 
formance  of  the  two  algorithms  was  evaluated  by  execution 
time.  The  average  co-location  size  and  the  average  number 
of  instances  of  co-locations  of  the  generated  datasets  are 
likely  different  from  the  initial  parameter  values  after  gener¬ 
ating  cut  instances  and  also  according  to  the  size  of  choosen 
spatial  framework  for  controlling  the  data  density.  We  will 
address  the  effect  of  these  parameters  and  noise  data  on 
performance  in  future  work.  All  the  experiments  were  per¬ 
formed  on  a  Sun  SunBlade  1500  with  1.0  GB  main  memory 
and  177MHz  CPU. 

6.2  Performance  Study 

The  experiment  was  conducted  using  detailed  simulations 
to  answer  the  following  questions  : 

1.  How  does  the  ratio  of  cut  neighbor  relations  over  total 
neighbor  relations  affect  the  performance  ? 

2.  How  does  data  density  in  the  neighborhood  area  affect 
the  performance  ? 

3.  How  do  the  algorithms  behave  with  different  prevalence 
thresholds  ? 

The  common  parameter  values  used  in  these  experiments 
were  as  follows:  the  neighborhood  size  to  define  a  co-location, 
d  x  d,  is  10  x  10,  the  number  of  base  co-locations,  NcoJoc, 
is  20,  the  average  size  of  co-location  patterns,  Ai,  is  4  and 
the  average  size  of  co-location  instances,  A2,  is  50. 

Effect  of  ratio  of  cut  neighbor  relations  :  The  effect 
of  performance  by  the  ratio  of  cut  neighbor  relations  over  to¬ 
tal  neighbor  relations  was  evaluated  with  synthetic  datasets 
generated  using  the  above  common  parameters  and  differ¬ 
ent  cut  instance  ratios,  i.e. ,  0,  0.1,  0.2,  0.3,  etc.  The  size  of 
the  overall  spatial  framework  was  fixed  to  400  x  400.  The 
prevalence  threshold  was  set  to  0.2. 

Figure  5  shows  the  execution  time  of  both  algorithms,  the 
partial  join  and  the  join-based,  over  cut  neighbor  relation 
ratios.  The  ratio  of  cut  neighbor  relations  over  total  neigh¬ 
bor  relations  was  controlled  by  the  cut  instance  ratio  in  the 
experiment.  The  overall  execution  time  increased  with  in¬ 
creases  in  the  ratio.  The  reason  is,  that  as  the  ratio  of  cut 
relations  becomes  larger,  the  size  of  interX  table  instances 
increases.  This  causes  the  number  of  instances  involved  in 
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Figure  5:  Effect  of  ratio  of  cut  neighbor  relations 
over  total  neighbor  relations 


the  join  operation  to  grow  and  the  execution  time  to  in¬ 
crease.  The  join-based  algorithm  also  shows  an  increase  in 
its  execution  time.  This  happens  because  the  number  of  in¬ 
stances  in  the  overlapping  area  increases  and  the  possibility 
of  neighbor  relations  with  instances  in  the  nearby  cells  in¬ 
creases,  thus  generating  many  neighborhood  instances.  The 
average  size  of  table  instances  also  increases.  The  perfor¬ 
mance  difference  between  the  two  algorithms  decreases  with 
increases  in  the  number  of  cut  neighborhoods.  When  all 
event  instances  were  related  to  cut  neighbor  relations,  the 
two  algorithms  showed  similar  execution  time. 


Table  1:  A  comparison  of  size  2  instances 
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Figure  6:  Effect  of  data  density  on  neighborhood 
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Effect  of  data  density  in  the  neighborhood  :  The  ef¬ 
fect  of  data  density  in  the  neighborhood  area  was  evaluated 
with  spatial  datasets  generated  using  the  above  common 
parameters  and  spatial  frameworks  of  different  size  /3,  i.e., 
500  x  500,  400  x  400,  360  x  360,  etc.,  to  control  the  data 
density  on  the  neighborhood.  The  cut  instance  ratio  a  was 
fixed  to  0.1  and  the  prevalence  measure  was  set  to  0.2.  The 
density  value  was  calculated  from  the  generated  dataset.  It 
is  the  ratio  of  the  average  number  of  instances  in  a  neigh¬ 
borhood  area  over  the  size  of  the  square  neighborhood  area, 
10  x  10.  The  increase  of  data  density  in  this  experiment 
mainly  affects  to  data  density  in  the  core  area  since  the  cut 
instance  ratio  is  fixed. 

Figure  6  illustrates  the  performance  gain  by  the  partial 
join  algorithm.  As  the  density  increases,  the  execution  time 
of  the  join-based  algorithm  is  dramatically  increased.  By 
contrast,  the  partial  join  algorithm  shows  little  effect  from 
the  size  of  data  density  in  the  neighborhood  area.  A  small- 
scale  increase  of  data  in  the  the  neighborhood  area  does 
not  much  affect  the  transaction-based  algorithm  while  the 
join-based  method  shows  great  sensitivity  to  even  a  small 
increase  of  the  data  density.  Table  1  shows  a  comparison 
between  the  number  of  size  2  interX  instances  generated  by 
the  partial  join  algorithm  and  the  number  of  size  2  instances 
of  the  join-based  algorithm.  These  instances  are  involved  in 
instance  join  operations  for  generating  size  3  instances.  As 
can  be  seen,  the  partial  join  approach  had  much  fewer  in¬ 
stances  than  the  join-method  in  this  experiment. 

Effect  of  prevalence  threshold  :  The  performance  ef¬ 
fect  as  the  prevalence  threshold  increases  is  given  in  Fig¬ 
ure  7.  The  experiment  was  conducted  with  the  above  com¬ 
mon  parameters,  a  400x400  spatial  framework  and  a  0.3 
cut  instance  ratio.  The  partial  join  approach  showed  much 
better  performance  than  the  join-based  approach  when  the 
threshold  values  were  low.  However,  the  gap  dramatically 
decreased  with  increases  in  the  threshold  value.  The  reason 
is  the  decrease  in  the  number  of  joins  of  instances  due  to  the 
efficient  pruning  of  the  event  level  search  space. 
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Figure  7:  Effect  of  prevalence  threshold 

6.3  Experiment  on  a  Real  Dataset 

We  evaluated  the  partial  join  algorithm  and  the  join-based 
algorithm  using  a  NASA  climate  dataset  of  the  U.S.  region. 
All  events  were  extracted  at  the  threshold  1.5  using  Z  score 
transformation  [16].  The  number  of  event  types  was  18. 
The  total  number  of  event  instances  was  15,515.  When  the 


neighborhood  distance  threshold  was  4,  the  total  number  of 
size  2  neighborhood  instances  was  390,392  and  the  number 
of  size  2  cut  neighborhood  instance  (size  2  interX  instances) 
was  314,078.  When  the  prevalence  threshold  was  0.1,  the 
maximum  size  of  co-locations  was  5.  Figure  8  presents  the 
execution  time  of  the  two  algorithms  as  a  function  of  the 
prevalence  threshold.  The  partial  join  method  shows  rela¬ 
tively  better  performance. 
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Figure  8:  A  comparison  using  a  real  dataset 


7.  CONCLUSION  AND  FUTURE  WORK 

In  this  paper,  we  identified  the  limitations  of  the  current 
co-location  mining  algorithm  and  proposed  a  novel  partial- 
join  approach  for  mining  complete  and  correct  co-location 
patterns.  This  approach  transactionizes  continuous  spatial 
data  while  keeping  track  of  the  spatial  information  not  mod¬ 
eled  by  transactions.  To  concretize  this  approach,  we  pro¬ 
posed  an  efficient  partial  join  co-location  algorithm  to  adopt 
the  instance  join  method  on  the  framework  of  the  Aprioir 
algorithm.  We  provided  an  algebraic  cost  model  to  charac¬ 
terize  the  dominance  zone  of  the  performance  between  the 
partial-join  approach  and  the  join-based  method.  The  per¬ 
formance  study  showed  that  our  approach  is  computation¬ 
ally  more  efficient  and  is  especially  robust  in  data  density 
on  the  neighborhood. 

In  future  work,  first,  we  plan  to  develop  an  alternative  ef¬ 
ficient  join  algorithm  for  generating  instances  of  co-locations 
without  including  duplicate  instances  in  intraX  table  in¬ 
stances  and  interX  table  instances.  This  approach  will  fur¬ 
ther  reduce  the  number  of  instance  joins.  The  algorithm  will 
be  robust  in  any  dataset,  e.g.,  dense  datasets  with  many 
cut  neighborhoods.  Second,  we  used  a  regular  grid  based 
transactionziation.  We  plan  to  examine  different  transac- 
tionization  methods,  e.g.,  maximal  cliques[3],  max-clique 
agglomerative  clustering  [20],  min  cut  partitioning  [6]  etc. 
Third,  recent  work  [19]  on  the  co-location  mining  presents 
a  general  method  to  find  the  maximal  patterns  of  reference 
feature  centric  co-locations  and  clique  co-locations  consid¬ 
ering  memory  constraints.  It  uses  techniques  of  spatial  join 
algorithms,  e.g.,  partition-based  spatial  join  [9]  ,  multiway 
spatial  join  [11].  We  plan  to  compare  our  approach  to  the 
spatial  join-based  method.  Finally,  although  current  co- 
location  patterns  are  defined  over  spatial  features,  data  in 
many  applications  include  spatio-temporal  features.  Thus 
we  also  plan  to  explore  a  co-location  mining  method  for 
spatio-temporal  datasets. 
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